Abstract
The following report contains the exercises requested in problem set 1. In the first part you can download the proofs of some properties and/or results related to AR, MA and ARMA process. In the second part, the Box-Jenkins methodology is applied to study three series of the Chilean economy: inflation, exchange rate and IPSA. One of the most important results of both exercises is related to how to apprehend time series structures, either theoretically or empirically we can say something that Wold ‘s theorem had already anticipated’‘Any stationary series can beexpressed as the sum of two components: a perfectly forecastable series and a moving average of possily infinite order’’ARMA models have been presented as a parsimonious tool to describe non-stationary stochastic processes. In theory, non-stationary series can be represented by an MA(\(\infty\)), i.e., capturing the entire memory of the series.
In practice this is very expensive, so we will show how we can approximate an MA(\(\infty\)) from an ARMA(\(p,q\)) model, with few parameters (i.e. \(p+q\) is small). We will be guided by the methodology of Box and Jenkins to achieve this task.
In order to use ARMA we need the non-stationary components or “trends around the mean” or “trends around the variance” to be removed. In addition to using transformations, we test a unit root test (Dickey Fuller’s test).
Other deterministic components are removed. In our case this is important because before 2001 we find that there is a clear inflationary path, and that this is evidently due to the change in the monetary policy regime (3% rule).
Third, we compute ACF and PACF to identify the order and type of the underlying model.
The model is estimated assuming the proposed model with p and q.
Identification tests are performed and the adequacy of the identification is evaluated. In this report we give importance to AIC and Ljung-Box.
In-sample predictions of the estimated model are made.
Figure 1. Series infsv_sa (IPC),
IPSA_sa (IPSA), tcn_sa (Exchange Rate CLP/USD)
(1990- 2022)
Pese a que las series utilizadas son las desestacionalizadas, as can be shown in Figure 1, inflation (measured by the consumer price index) presents a clear trend before 2001. This price growth trend was stabilized after the Central Bank set a target of around 3% inflation and a policy of nominalization (Fuentes et al, 2003). Similarly, since 2020, due to the health crisis caused by the COVID-19 pandemic, the consequences have also been reflected in an increase in the cost of living.
In order to isolate the trends mentioned above, we have chosen to limit the period of analysis from 2001 to 2020, both for inflation and for the other variables of interest, in order to make the models more comparable. We will use the series shown in Figure 2 for the following steps.
Figure 2. Series infsv_sa (Inflation),
ipsa_sa (IPSA), tcn_sa (Exchange
Rate CLP/USD) (2001- 2022)
InflationFirst, in Figure 3 we have a clear representation of an increasing trend in the price level. As we mentioned at the beginning, ARMA models work on the basis of non-stationary series, but graphically it seems that inflation still has a trend component. We will use the Dickey Fuller unit root test to conjecture if there is evidence of this trend. Formally,
\[\triangle Y_t = \alpha + \phi y_{t-1}+ \varepsilon\]
\(H_0: \phi = 0\Rightarrow\) Presence of stochastic trend in the observations.
\(H_1: \phi <0 \Rightarrow\): No presence of stochastic trend in the observations.
| method | Valor-p | statistic | parameter | alternative | resultado 95% |
|---|---|---|---|---|---|
| Augmented Dickey-Fuller Test | 0.2727477 | -2.720626 | 6 | stationary | Existe unit-root |
Table 1 shows that with 95% confidence, we cannot reject the null
hypothesis. That is, it is likely to say that there is a stochastic
trend in this series. Our calculations show that it is a trend in means
so it can be solved with a simple differencing (if it were a trend in
variances a logarithmic transformation would be appropriate). After the
transformation we plot the series in Figure 4.Table 2 shows
that we can now reject the null hypothesis with 95% confidence.
| method | Valor-p | statistic | parameter | alternative | resultado 95% |
|---|---|---|---|---|---|
| Augmented Dickey-Fuller Test | 0.01 | -7.911852 | 6 | stationary | Es I(0), no unit-root |
We will now explore the order of the AR and MA processes. On the one hand, the ACF gives us information about the order \(q\) of the MA. The figure is not very clear about whether the value is at 1 or much higher (there are values near to 14). On the other hand, the (partial) PACF gives us the p-value, i.e., the order of the AR(p) process. The figure shows with much more certainty that the process “dies” between 3 and 5. Evidently the value 5 could be possible only because of a convenience of the size of the interval.
| Ajuste | sigma | logLik | AIC | BIC | Box-Ljung test residuos p value |
|---|---|---|---|---|---|
| ARMA(3, 4) | 0.1266875 | 156.4905 | -294.9810 | -263.6553 | 0.9888481 |
| ARMA(5, 2) | 0.1278088 | 156.1140 | -294.2279 | -262.9022 | 0.9870518 |
| ARMA(5, 3) | 0.1280881 | 156.1609 | -292.3218 | -257.5155 | 0.9719941 |
| ARMA(4, 3) | 0.1287374 | 154.7641 | -291.5282 | -260.2024 | 0.4883086 |
| ARMA(5, 5) | 0.1260530 | 157.6999 | -291.3998 | -249.6321 | 0.8613235 |
| ARMA(5, 4) | 0.1283396 | 156.1747 | -290.3494 | -252.0624 | 0.9737545 |
| ARMA(1, 1) | 0.1310111 | 148.5588 | -289.1176 | -275.1951 | 0.9331472 |
| ARMA(3, 5) | 0.1287390 | 153.7904 | -287.5808 | -252.7744 | 0.9201097 |
| ARMA(2, 1) | 0.1312204 | 148.6825 | -287.3650 | -269.9618 | 0.9902070 |
| ARMA(1, 2) | 0.1312435 | 148.6406 | -287.2812 | -269.8780 | 0.9592657 |
| ARMA(4, 5) | 0.1288373 | 154.1374 | -286.2748 | -247.9878 | 0.8734898 |
| ARMA(2, 2) | 0.1314526 | 148.7664 | -285.5329 | -264.6490 | 0.9784873 |
A function has been created to order the models according to their fit considering AIC (information criterion), Box-Ljung which studies that any series of autocorrelations is non-zero (Portmanteau test), logLik. Taking this information, the function penalizes the ARMAs that have higher order p+q. That is why we select the model Modelo ARMA(3, 4)which has AIC of -294.9810181. The estimated parameters are:
| term | estimate | std.error | 2.5 % | 97.5 % |
|---|---|---|---|---|
| ar1 | 0.5268825 | 0.0343736 | 0.4595115 | 0.5942536 |
| ar2 | 0.5613215 | 0.0404627 | 0.4820162 | 0.6406269 |
| ar3 | -0.9450220 | 0.0330410 | -1.0097813 | -0.8802627 |
| ma1 | -1.1044121 | 0.0944051 | -1.2894427 | -0.9193815 |
| ma2 | -0.2978542 | 0.0946102 | -0.4832867 | -0.1124217 |
| ma3 | 1.2724179 | 0.0954817 | 1.0852773 | 1.4595586 |
| ma4 | -0.4708067 | 0.0753000 | -0.6183920 | -0.3232214 |
| intercept | -0.0009775 | 0.0037623 | -0.0083514 | 0.0063965 |
Auto-correlation functions of residuals are represented in ACF. As can be seen, the correlogram “dies” at zero so it evidently reveals to be white noise. This tells us that the residuals have no structure and therefore the model has been well specified and does not store information about the series.
The Ljung Box statistical significance gives us a robustness test: autocorrelation does not occur for any lag of the series (see order equal to 10 in figure 6 below).
The last step of Box-Jenkins corresponds to prediction. As we can see in the figure presented, the values predicted by the ARMA model follow very closely the empirical series.
IPSAThe IPSA series (Chile’s main stock market index) is presented in Figure 8. As can be seen in Table 5, with a 95% confidence level, the null hypothesis can be rejected. Thus, there is evidence with a 5% error that there is no stochastic trend in the series presented.
| method | Valor-p | statistic | parameter | alternative | resultado 95% |
|---|---|---|---|---|---|
| Augmented Dickey-Fuller Test | 0.01 | -5.043973 | 6 | stationary | Es I(0), no unit-root |
Regarding the graphs showing the orders of the models, we say that neither of them show a “smooth” fall towards any order, and rather they are always within the confidence interval. They only appear outside the interval at order 12, which must show some annual memory of the series. Without taking into account the confidence intervals, it could be seen that the ACF and PACF orders are quite symmetrical (from the similarity of the figures). Thus, it is possible that the significant drop occurs after the order p,q > 3.
Table 6 shows the 12 best combinations of ARMA(p,q), and as mentioned before, a function has been created to order them in such a way as to rank them considering the number of parameters, AIC settings and residuals test above all.
| Ajuste | sigma | logLik | AIC | BIC | Box-Ljung test residuos p value |
|---|---|---|---|---|---|
| ARMA(3, 3) | 4.065532 | -678.7555 | 1373.511 | 1401.389 | 0.7484041 |
| ARMA(1, 1) | 4.164760 | -684.2937 | 1376.587 | 1390.527 | 0.4704815 |
| ARMA(4, 2) | 4.119725 | -680.3639 | 1376.728 | 1404.606 | 0.9895842 |
| ARMA(1, 2) | 4.163729 | -683.7363 | 1377.473 | 1394.897 | 0.9721260 |
| ARMA(2, 1) | 4.164388 | -683.7731 | 1377.546 | 1394.970 | 0.9903279 |
| ARMA(5, 5) | 4.062058 | -676.9303 | 1377.861 | 1419.678 | 0.9573418 |
| ARMA(4, 3) | 4.127846 | -680.2603 | 1378.521 | 1409.884 | 0.9887073 |
| ARMA(5, 2) | 4.129013 | -680.3567 | 1378.713 | 1410.077 | 0.9896419 |
| ARMA(1, 3) | 4.170762 | -683.6374 | 1379.275 | 1400.184 | 0.9715892 |
| ARMA(2, 2) | 4.172423 | -683.7272 | 1379.454 | 1400.363 | 0.8881015 |
| ARMA(3, 1) | 4.172472 | -683.7327 | 1379.465 | 1400.374 | 0.9854579 |
| ARMA(4, 1) | 4.163871 | -682.7420 | 1379.484 | 1403.878 | 0.9915723 |
We select the model Modelo ARMA(3, 3), with AIC1373.4120602. The estimated parameters are:
| term | estimate | std.error | 2.5 % | 97.5 % |
|---|---|---|---|---|
| ar1 | 0.8712972 | 0.3745375 | 0.1372171 | 1.6053772 |
| ar2 | 0.5023087 | 0.6306094 | -0.7336630 | 1.7382803 |
| ar3 | -0.7528798 | 0.3356288 | -1.4107001 | -0.0950595 |
| ma1 | -0.9396423 | 0.3870535 | -1.6982532 | -0.1810313 |
| ma2 | -0.4191528 | 0.6751012 | -1.7423268 | 0.9040213 |
| ma3 | 0.8107539 | 0.3858066 | 0.0545869 | 1.5669208 |
| intercept | 0.6433242 | 0.3069293 | 0.0417539 | 1.2448946 |
Auto-correlation functions of residuals are represented in ACF. As can be seen, the correlogram “dies” at zero so it evidently reveals to be white noise. This tells us that the residuals have no structure and therefore the model has been well specified and does not store information about the series.
The Ljung Box statistical significance gives us a robustness test: autocorrelation does not occur for any lag of the series (see order equal to 24 in Figure 9 below).
Unlike the inflation series, the IPSA forecast does not follow the observed values as closely. The kurtosis of the curves is something that the forecasts fail to achieve elegantly.
3. Exchange RateFigure 11 shows the exchange rate from Chilean pesos to dollars. It shows only a large shock due to the 2008 crisis, but in general it remains around the average. In Table 8 we prove that there is no conclusive evidence of unit root, so with 5% error there is no stochastic trend in this series
Regarding the orders of the models (Figure 12), at least these are clearer than in the case of IPSA. In this case it seems that the orders are not symmetric, although p > q. Now, it appears that both are close between 3 and 2, but neither correlogram is “smoothly decaying”.
| method | Valor-p | statistic | parameter | alternative | resultado 95% |
|---|---|---|---|---|---|
| Augmented Dickey-Fuller Test | 0.01 | -5.594978 | 6 | stationary | Es I(0), no unit-root |
In Table 9 we see the selection of models, where we see the result of what was discussed in the previous figure, where precisely what we indicated before stands out: the pairs (3,2), (4,2) and (2,3) are those that lose less information, and less significant is their correlation tes of residuals (in 5,2 it is already 0.84).
| Ajuste | sigma | logLik | AIC | BIC | Box-Ljung test residuos p value |
|---|---|---|---|---|---|
| ARMA(3, 2) | 2.380310 | -548.1035 | 1110.207 | 1134.600 | 0.9592174 |
| ARMA(4, 2) | 2.375450 | -547.1140 | 1110.228 | 1138.106 | 0.9793663 |
| ARMA(2, 3) | 2.381598 | -548.2437 | 1110.487 | 1134.881 | 0.9573739 |
| ARMA(3, 3) | 2.377868 | -547.3355 | 1110.671 | 1138.549 | 0.8419124 |
| ARMA(2, 5) | 2.375085 | -546.5865 | 1111.173 | 1142.536 | 0.9521570 |
| ARMA(1, 1) | 2.402211 | -551.7083 | 1111.417 | 1125.356 | 0.9867599 |
| ARMA(5, 2) | 2.378433 | -546.8993 | 1111.799 | 1143.162 | 0.9789216 |
| ARMA(2, 2) | 2.393469 | -549.9395 | 1111.879 | 1132.788 | 0.2429308 |
| ARMA(4, 3) | 2.380006 | -547.0524 | 1112.105 | 1143.468 | 0.9945415 |
| ARMA(2, 4) | 2.385934 | -548.1558 | 1112.312 | 1140.190 | 0.9863368 |
| ARMA(1, 2) | 2.403171 | -551.3031 | 1112.606 | 1130.030 | 0.9994660 |
| ARMA(2, 6) | 2.378219 | -546.3872 | 1112.774 | 1147.622 | 0.9952449 |
We select the model Modelo ARMA(3, 2), with 1110.2068976. The estimated parameters are:
| term | estimate | std.error | 2.5 % | 97.5 % |
|---|---|---|---|---|
| ar1 | 0.4944275 | 0.1949472 | 0.1123380 | 0.8765171 |
| ar2 | -0.8592603 | 0.1104293 | -1.0756978 | -0.6428228 |
| ar3 | 0.1498004 | 0.0822274 | -0.0113624 | 0.3109632 |
| ma1 | -0.2140941 | 0.1762483 | -0.5595344 | 0.1313461 |
| ma2 | 0.8647872 | 0.0940674 | 0.6804185 | 1.0491559 |
| intercept | 0.1829407 | 0.2054703 | -0.2197736 | 0.5856550 |
Auto-correlation functions of residuals (Figure 13) are represented in ACF. As can be seen, the correlogram “dies” at zero so it evidently reveals to be white noise. This tells us that the residuals have no structure and therefore the model has been well specified and does not store information about the series.
The Ljung Box statistical significance gives us a robustness test: autocorrelation does not occur for any lag of the series (see order equal to 10 in figure 13 below).
Unlike IPSA we see a much better fit of the exchange rate to the empirical series, something very similar to what happened with inflation. In fact, this model, the one that occupies fewer parameters is the one that ” follows closely the series”. This tells us that learning from the series does not imply incorporating more variables into the model, but rather how much we can understand from the data generating process we are working with. For example, some of the key questions we have asked so far are: are the series stationary, do they have a trend, does adding more orders improve my prediction?